Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
最近,视觉变压器及其变体在人类和多视图人类姿势估计中均起着越来越重要的作用。将图像补丁视为令牌,变形金刚可以对整个图像中的全局依赖项进行建模或其他视图中的图像。但是,全球关注在计算上是昂贵的。结果,很难将这些基于变压器的方法扩展到高分辨率特征和许多视图。在本文中,我们提出了代币螺旋的姿势变压器(PPT)进行2D人姿势估计,该姿势估计可以找到粗糙的人掩模,并且只能在选定的令牌内进行自我注意。此外,我们将PPT扩展到多视图人类姿势估计。我们建立在PPT的基础上,提出了一种新的跨视图融合策略,称为人类区域融合,该策略将所有人类前景像素视为相应的候选者。可可和MPII的实验结果表明,我们的PPT可以在减少计算的同时匹配以前的姿势变压器方法的准确性。此外,对人类360万和滑雪姿势的实验表明,我们的多视图PPT可以有效地从多个视图中融合线索并获得新的最新结果。
translated by 谷歌翻译
这项工作考虑了在属性关系图(ARG)上表示表示的任务。 ARG中的节点和边缘都与属性/功能相关联,允许ARG编码在实际应用中广泛观察到的丰富结构信息。现有的图形神经网络提供了有限的能力,可以在局部结构环境中捕获复杂的相互作用,从而阻碍他们利用ARG的表达能力。我们提出了Motif卷积模块(MCM),这是一种新的基于基线的图表表示技术,以更好地利用本地结构信息。处理连续边缘和节点功能的能力是MCM比现有基于基础图案的模型的优势之一。 MCM以无监督的方式构建了一个主题词汇,并部署了一种新型的主题卷积操作,以提取单个节点的局部结构上下文,然后将其用于通过多层perceptron学习高级节点表示,并在图神经网络中传递消息。与其他图形学习方法进行分类的合成图相比,我们的方法在捕获结构环境方面要好得多。我们还通过将其应用于几个分子基准来证明我们方法的性能和解释性优势。
translated by 谷歌翻译
由于问题过度问题,大多数现有的图形神经网络只能使用其固有有限的聚合层捕获有限的依赖性。为了克服这一限制,我们提出了一种新型的图形卷积,称为图形隐式非线性扩散(GIND),该卷积隐含地可以访问邻居的无限啤酒花,同时具有非线性扩散的自适应聚集特征,以防止过度张开。值得注意的是,我们表明,学到的表示形式可以正式化为显式凸优化目标的最小化器。有了这个属性,我们可以从优化的角度从理论上表征GIND的平衡。更有趣的是,我们可以通过修改相应的优化目标来诱导新的结构变体。具体而言,我们可以将先前的特性嵌入到平衡中,并引入跳过连接以促进训练稳定性。广泛的实验表明,GIND擅长捕获长期依赖性,并且在具有非线性扩散的同粒细胞和异性图上表现良好。此外,我们表明,我们模型的优化引起的变体可以提高性能并提高训练稳定性和效率。结果,我们的GIND在节点级别和图形级任务上都获得了重大改进。
translated by 谷歌翻译
(源)代码摘要旨在以自然语言的形式自动为给定代码段生成摘要/注释。此类摘要在帮助开发人员理解和维护源代码方面起着关键作用。现有的代码摘要技术可以分类为提取方法和抽象方法。提取方法使用检索技术从代码段中提取重要语句和关键字的子集,并生成一个摘要,该摘要保留了重要语句和关键字中的事实详细信息。但是,这样的子集可能会错过标识符或实体命名,因此,产生的摘要的自然性通常很差。抽象方法可以生成类似人写的摘要,从而利用神经机器翻译域的编码器模型。然而,生成的摘要通常会错过重要的事实细节。为了通过保留的事实细节生成类似人写的摘要,我们提出了一个新颖的提取和吸收框架。框架中的提取模块执行了提取代码摘要的任务,该任务列入了代码段,并预测包含关键事实细节的重要陈述。框架中的抽象模块执行了抽象代码摘要的任务,该任务是在整个代码段和并行的重要陈述中进行的,并生成了简洁而人工写的类似的自然语言摘要。我们通过在涉及六种编程语言的三个数据集上进行广泛的实验来评估称为EACS的有效性。实验结果表明,在所有三种广泛使用的指标(包括BLEU,流星和Rough-l)方面,EACS明显优于最先进的技术。
translated by 谷歌翻译
深度图用于从3D渲染到2D图像效应(例如散景)的广泛应用。但是,单个图像深度估计(侧)模型预测的人通常无法捕获对象中的孤立孔和/或具有不准确的边界区域。同时,使用商业自动掩蔽工具或现成的分割和垫子的方法,甚至是通过手动编辑,使用商业自动掩盖工具或现成的方法更容易获得。因此,在本文中,我们提出了一个新的掩盖引导深度细化的问题,该问题利用通用掩模来完善侧面模型的深度预测。我们的框架执行了分层的细化和介入/架设,将深度图分解为两个由掩码和倒置面罩表示的单独的层。由于具有深度和掩码注释的数据集很少,因此我们提出了一种使用任意掩码和RGB-D数据集的自我监督学习方案。我们从经验上表明,我们的方法对不同类型的掩模和初始深度预测具有鲁棒性,可以准确地完善内部和外掩模边界区域的深度值。我们通过消融研究进一步分析了我们的模型,并证明了实际应用的结果。可以在https://sooyekim.github.io/maskdepth/上找到更多信息。
translated by 谷歌翻译
GPT-3和Palm等大型语言模型在几次学习中表现出色。但是,他们仍然在推理任务(例如算术基准GSM8K)上挣扎。最近的进步故意指导语言模型在产生最终答案之前生成一系列推理步骤,从而成功地将GSM8K基准从17.9%提高到58.1%,以解决问题的解决率。在本文中,我们提出了一种新的方法,即多样化的方法(关于推理步骤的多样化验证者),以进一步提高其推理能力。多样性首先探索不同的提示,以增强推理路径的多样性。其次,Diverse介绍了一个验证者,以区分好的答案和不良答案,从而获得更好的权重投票。最后,多样性验证每个步骤的正确性,而不是整体上的所有步骤。我们使用最新的语言型号Davinci-002进行广泛的实验,并证明多样化可以在八分之六的推理基准中实现新的最先进的性能(例如,GSM8K 74.4%至83.2%),超过棕榈具有540B参数的模型。
translated by 谷歌翻译
合成器是一种电子乐器,现在已在现代音乐制作和声音设计中广泛使用。合成器的每个参数配置都会产生独特的音色,可以看作是独特的仪器。估计一组最能恢复声音音色的参数配置的问题是一个重要但复杂的问题,即:合成器参数估计问题。我们提出了一个基于多模式的深度学习管道Sound2syth,以及一个专门设计用于解决此问题的网络结构原始卷积(PDC)。我们的方法不仅实现了SOTA,而且还获得了第一个现实世界中的第一个适用于Dexed合成器(一种流行的FM合成器)。
translated by 谷歌翻译